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Abstract 

Missing data problems arise in many applied research studies. They may jeopardize statis¬ 
tical inference of the model of interest, if the missing mechanism is nonignorable, that is, the 
missing mechanism depends on the missing values themselves even conditional on the observed 
data. With a nonignorable missing mechanism, the model of interest is often not identihable 
without imposing further assumptions. We find that even if the missing mechanism has a known 
parametric form, the model is not identifiable without specifying a parametric outcome distribu¬ 
tion. Although it is fundamental for valid statistical inference, identifiability under nonignorable 
missing mechanisms is not established for many commonly-used models. In this paper, we hrst 
demonstrate identifiability of the normal distribution under monotone missing mechanisms. We 
then extend it to the normal mixture and t mixture models with non-monotone missing mech¬ 
anisms. We discover that models under the Logistic missing mechanism are less identifiable 
than those under the Probit missing mechanism. We give necessary and sufficient conditions for 
identifiability of models under the Logistic missing mechanism, which sometimes can be checked 
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in real data analysis. We illustrate our methods using a series of simulations, and apply them 
to a real-life dataset. 

Keywords: Heavy tail; Logistic model; Missing not at random; Monotone missing mechanism; 
Probit model; Selection model. 


1. INTRODUCTION 

Missing data arise in many biomedical and socioeconomic studies. In the presence of missing data, 
the observed data may not be representative for the population of interest, especially when the 
missing mechanism depends on the missing values themselves. For instance, if rich people tend not 
to respond in a survey, then the average wage obtained from the observed data will be lower than 
the truth. 

An effective way to overcome this problem is to model the missing mechanism conditional on the 
observed covariates and the outcome. Using the terminologies in Rubin ( ]1976 ), the missing mech¬ 
anism is called missing at random (MAR) if it does not depend on the missing values themselves 
conditional on the observed data, and it is called missing not at random (MNAR) otherwise. In the 
current literature, a variety of estimation methodologies based on the MAR assumption have been 
proposed, including likelihood-based inference, imputation, inverse probability weighting, and dou¬ 
bly robust methods. However, MNAR is often the case in practice, when the missingness depends 
on the missing values even conditional on the observed covariates. Unfortunately, statistical infer¬ 
ence becomes quite challenging with data subject to MNAR mechanisms, because the models are 
often not identifiable based on the observed data. Many authors have studied models under MNAR 
mechanisms. Among them, the most popular approach is to model the conditional distribution of 


the missing indicator given the outcome and covariates, termed the selection model (Little and 


Rubin, 2002). Greenlees et al. (1982) propose maximum likelihood estimators for survey data with 


missing values, based on a fully parametric Logistic MNAR mechanism. Qin et al. (2002) propose 
an empirical likelihood estimation procedure for the case with nonparametric outcome model and 


parametric missing mechanism. Ma et al. (2013) study the semiparametric case with a symmetric 
outcome distribution and a parametric missing mechanism. Although they have developed useful 
estimation methods, the identifiability of their models may not be guaranteed even if the the miss- 
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ing mechanism is parametric. Rotnitzky et al.| (1998) and Scharfstein and Irizarry (2003) develop 
methods for conducting sensitivity analysis by assuming completely specified outcome-dependent 


terms in the missing mechanisms. In his influential paper in econometrics, Heckman (1979) pro¬ 
poses the Heckman Selection Model consisting of an outcome equation and a selection equation 
indicating the latent variable for the missing mechanism. In the Heckman Selection Model, the 
missing mechanism is nonignorable when the bivariate normal error terms of the outcome equation 
and selection equation are correlated. As a fundamental problem, valid statistical inference of these 
methods relies on their identifiability. Unfortunately, identifiability does not always hold even in 
parametric models. 

When an “instrumental variable” is available, i.e., there exists a variable that is associated with 
the outcome variable but independent of the missingness conditional on the outcome, identifiabil¬ 


ity of some MNAR models can be achieved. For example, Chen (2001) shows identifiability of a 


subset of the regression parameters with biased sampling data, Wang et al. (2014) show identifia¬ 


bility of certain nonparametric models with parametric missing mechanism, and Chen et al. (2009) 
demonstrate the identifiability for binary outcomes with nonignorable missing data. However, it 
is not often feasible to find such an instrumental variable, without which the identifiability is not 
guaranteed in general. 

In this paper, we focus on the identihability of models under MNAR mechanisms. We illustrate 
with counterexamples the potential difficulty for achieving identifiability under nonparametric out¬ 
come models in Section 2. We prove identifiability of the normal model under the monotone missing 
mechanism in Section 3 without requiring any instrumental variables. We find that models under 
the frequently-used Logistic missing mechanism are less identifiable than those under the Probit 
missing mechanisms, and give necessary and sufficient conditions for the identihability of models 
under the Logistic missing mechanism. In Section 4, we extend the results to normal mixture and 
t mixture models, which are useful to accommodate more complex data features such as heavy- 
tailedness and multimodality. In Section 5, we propose a latent monotone missing mechanism and 
establish their identihability. In Section 6, we evaluate the hnite sample properties of the nonig¬ 
norable missing data models via a series of simulations. The simulation results show advantages of 
normal mixture and t mixture models for htting complex data. In Section 7, we detect a nonig- 
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norable missing mechanism and latent components of the outcome variable in analysis of a real-life 
data on ambulatory expenditure. We conclude in Section 8, and relegate all the technical details 
to the Supplementary Materials. 


2. POTENTIAL DIFFICULTY FOR NONPARAMETRIC IDENTIFICATION 

Throughout the paper, we let X denote completely observed covariates, Y denote the outcome 
variable, and R denote the missing indicator of Y with R = 1 if Y is observed and R = 0 otherwise. 
We use lower-case letters to denote realized values of the corresponding random variables, e.g., y for 
a realized value of Y. Suppose the observed data are n independently and identically distributed 
samples, with some values of Y missing. 

The observed data allow us to identify only the observed distribution P{y, R = l\x), which, how¬ 
ever, does not suffice to determine the joint distribution P{y,r\x) without additional assumptions. 
There are two equivalent ways to factorize the joint distribution: 


P{y,r\x) = P{y\x)P{r\x,y) = P{r\x)P{y\x,r), 


with the first one being called the selection model and the second one being called the pattern mix¬ 


ture model (Little and Rubin, 2002). Analogously, the observed distribution permits the following 


two equivalent factorizations: 


P{y, R = l|x) = P{y\x)P{R = l|x, y) = P{R = l\x)P{y\x, R = 1). 


In this paper, we adopt the selection model factorization for our discussion. Fundamentally, we are 
interested in identifying the joint distribution P{y, r\x) by the observed distribution P{y, R = l|x). 
We say that the model is identifiable, if and only if the joint distribution P{y, r\x) can be uniquely 
determined by the observed distribution P{y,R = l|x), or, equivalently, two models yielding the 
same observed distribution must have the same joint distribution. 

Because of their flexibility, nonparametric models are often used in the missing data literature 


(e.g., Qin et ah, 2002; Ma et ah, 2013). However, in general, the outcome distribution is not 
identifiable without specifying a parametric form for it, even if the missing mechanism is parametric. 
Below we provide three counterexamples to illustrate this potential difficulty. 
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Example 1. Consider the following models: 

Model[^l. Y ~ Unif(—0.5, 0.5), and F(R = l\y) = <h(y), where $(?/) is the distribution function 
of the standard normal distribution; 

Model[^2. Y has density 2<l>(y)/{—0.5 <y< 0.5}, and P{R = l\y) = <h(0) = 1/2. 

The models above have the same parametric Probit missing mechanism, but they have different 
outcome distributions. Nevertheless, they have the same observed distribution because 

/{-0.5 <y< 0.5} • $(?/) = 2$(y)/{-0.5 <y< 0.5} • 1/2. 

Therefore, if we assume a nonparametric model for the outcome with a Probit missing mechanism, 
they cannot be distinguished by the observed data P{y, R = 1). 

Example 2. Consider the following models: 

Model [^1: Y ~ Exp(2), logit P{R = l\y) = — log2 + y; 

Model [^2: Y ~ Exp(l), logit P{R = l|y) = log2 — y. 

The models above have the same parametric Logistic missing mechanism, but they have different 
outcome distributions. They have the same observed distribution because 


2e 


-2y 


log 2 + 1 / 


= e-y 


Jog2-y 


1 _|_ g-log 2 + 2 / 1 _|_ glog 2 - 1 /■ 

Therefore, they cannot be distinguished by the observed distribution. 

The above counterexamples demonstrate the difficulty for obtaining identifiability of nonpara¬ 
metric outcome models even with parametric missing mechanisms. Unfortunately, even if we fur¬ 


ther restrict the outcome distributions to be symmetric as Ma et al. (2013) and assume parametric 
missing mechanisms, we still cannot identify the mean or distribution of the outcome. 

Example 3. Consider the following models: 

Modelj^l. Y ~ A^(l, 1), logit P{R = l\y) = —3/2 -|- y; 

Model[^2. Y ~ A^(2,1), logit P{R = l\y) = 3/2 — y. 

The models above have the same parametric Logistic missing mechanism, and symmetric out¬ 
come distributions. However, they cannot be identihed by the observed distribution because 


1-I-exp(-3/2-I-y) ' 1-|-exp(3/2 - y) ’ 

where <?i(y) is the density of the standard normal distribution. 
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The above counterexamples show that nonparametric or semiparametric outcome models are 
generally not identifiable, even though the missing mechanisms are parametric. Without identifia- 
bility, the estimates for the parameters of the nonparametric outcome models may be misleading 
and of limited interest in practice. 

3. NORMAL MODEL WITH NONIGNORABLE MISSING DATA 

With nonignorable missing mechanisms, although identifiability of nonparametric outcome models 
is often hard to achieve as illustrated in Section parametric outcome models are more likely to 
be identified. However, identifiability of many commonly-used parametric models have not been 
established in the literature. In this section, we discuss identifiability of the normal model with 
nonignorable missing data. We first show the conditions for its identifiability without covariates, and 
illustrate the non-identifiability for some nonignorable missing mechanisms, such as the commonly- 
used Logistic missing mechanism. We then extend the result to nonignorable missing mechanisms 
with covariates, and utilize covariates to improve identifiability. 

3.1 Identifiability without covariates 

Suppose Y ~ with the following missing mechanism: 

P{R = l\y) = F{a + ^y), (1) 

where F{-) is a known and strictly monotone distribution function with support on (—oo,-|-cso). 
For instance, the standard normal distribution corresponds to the Probit missing mechanism, and 
the Logistic distribution corresponds to the Logistic missing mechanism. The missing mechanism 
is MAR if /3 = 0, and it is MNAR if /3 7 ^ 0. Mechanism 0 depicts that the response probability is 
monotone in the value of the outcome. For example, in some sensitive questionnaires, people with 
higher outcome values tend not to respond, and therefore /3 < 0 . 

The following condition about the tail behavior of F(-) plays a central role in our discussion. 

Condition A. For any 5 > 0, lim 2 _>._oo F{z)je^^ = 0 or -|-oo. 

Condition requires the left tail decay rate of the response probability be not exponential. 
We can verify that the Probit missing mechanism satisfies Condition but the Logistic missing 
mechanism does not because lim 2 _ 5 ._oo{e^/(l -|- e^)}/e^ = 1 . 
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Theorem 1. Suppose Y ~ N{fj,,a‘^) with missing mechanism Q. Then 

(a) cj^ and \f5\ are identifiable; 

(b) fi, a'^, a and j3 are identifiable, if the sign of f3 is known; 

(c) fi, (T^, a and /3 are identifiable, if Condition [ a| holds. 


Result (a) above indicates that the variance of the outcome and the absolute value of its impact 
on the response probability are always identifiable, but the sign of its impact may not be identifiable. 
Result (b) implies that all parameters are identifiable if we have prior knowledge about the tendency 
of the missingness, i.e., the sign of /3. Result (c) shows that all parameters are identifiable, if F(-) 
satisfies Condition Example in Section is also an example illustrating that the normal model 
under the Logistic missing mechanism may not be identifiable. A similar example is also presented 
Wang et al. ( |2014 ). 


m 


If F(-) = the standard t distribution function with degrees of freedom z/, then it satisfies 

Condition[A|when is known, and thus all parameters are identifiable. The model with F(-) = Ty{-) 


for the missing mechanism is sometimes referred to as the Robit model (Liu 2004). By varying 
the degrees of freedom v, the Robit models can be used to approximate many missing mechanisms. 
For instance, when u tends to infinity, it approximates the Probit model; when u is near 7 or 8 , 


it approximates the Logistic model (Mudholkar and Ceorge, 1978 Liu, 2004). In fact, we have a 
stronger conclusion for the Robit model that the degrees of freedom parameter is also identifiable 
even if it is unknown. 


Corollary 1. If T ~ A^(/r, cj^), P{R = l\y) = Ty{a + /3y), and /3 7 ^ 0, then n, z/, a and {3 are 
all identifiable. 


3.2 Identifiability with covariates 

When some completely observed covariates X are available, we assume the outcome model 

Y\x ^N{y{x, 7 ), o-^(x, (9)}, (2) 

and a generalized additive missing mechanism 

P{R = l\x,y) = F{g{x,a) + l3y}, (3) 
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where F(-) is a known and strictly monotone distribution function, and cr(-,-) have known 

forms, and ( 7 , 9, a, (3) are unknown parameters. Note that ( 7 , 9, a) may be vectors. And we require 
that ^(-, 7 ), cr{-,9) and g{-,a) have a one-to-one mapping to 7,0 and a, respectively. For instance, 
the linear function g{x, 7 ) = x ^7 satisfies this condition. 

Theorem 2. Assume that the outcome model is ([^ and the missing mechanism is Q. Then 

(a) 9 and \f3\ are identifiable; 

(b) 7 , 9, a and {3 are identifiable, if the sign of f3 is known; 

(c) 7 , 9, a and /3 are identihable, if Condition [A| holds; 

(d) 7 , 9, a and j3 are identifiable, if the function pair {|u(-, 7 ), 0)} and the function g{-.,a) 

are linearly uncorrelated, i.e., a •//(•, 7 ) + h ■ 9) + g{-^ a) / c for nonzero vector (a, 6 , c) 

and for all ( 7 , 9, a). 

In the above, (a), (b) and (c) are parallel with the results of Theorem]^ Result (d) gives another 
sufficient condition for improving identifiability by observed covariates X, when Condition [A| does 
not hold. Results (d) does not allow the function g(-,a) in the missing mechanism be linearly 
correlated with the function pair but it allows for dependence between the mean 

function and the variance function a^(-,9). We illustrate Theorem]^ by the following three 

examples. 

Example 4. Assume that 

Y\x ~ A^(7o + lix, a^), P{R = l\x, y) = <h(ao + oiix -|- f3y). 


The Probit missing mechanism satisfies Condition in Theorem and thus this model is iden¬ 


tifiable. We can verify that this model is equivalent to the Heckman Selection Model (Heckman 


1979) up to different parametrizations. 


Contrary to the Probit missing mechanism, the Logisitic missing mechanism does not satisfy 
Condition [A| in Theorem as pointed out before. Many researchers (e.g., Greenlees et al. 1982 


Glynn et al.[ |1986 Qin et ah] |2002 Rotnitzky and Robins, 1997) use the Logistic missing mecha¬ 
nisms when the outcome is MNAR. However, it should be noted that the Logistic missing mechanism 














is not identifiable even when some completely observed covariates are available, as shown in the 
following example. 

Example 5. Assume that 

Y\x ~ A^(7 o + 7 ix,cj^), logit P{R = l\x,y) = ao + c».ix + f3y. 

Let cj^ = 1 and 71 = 0.5. Then the two different sets of parameters, ( 70 , ao; otii P) = (0, —2, —1, 2) 
and (2,2,1,—2), lead to the same observed distribution. Therefore, the model above with the 
Logistic missing mechanism is not identifiable. 

Example 6. Assume that 

Y\x ~ N{'yo + 71 X,(T^), P{R = l|x, y) = F{ao + aix + a 2 x‘^ + /3y), 

where 7 ^ 0. Without any assumptions on F{-), the missing mechanism satisfies the condition 
in Theorem j^d), where X has a linear impact on Y but a quadratic impact on its missingness. 
Thus all parameters are identifiable, even though Condition of Theorem may not be satisfied. 
For instance, even if F{-) is the Logistic distribution function, the model is still identihable with 
covariates X. Therefore, the non-identifiability problems in Example and can be alleviated. 

Note that in Example]^ although ( 70 ,ao,ai,/3) are not identihable, the value of 71 is unique, 
which means that the slope of the outcome model is identihable, but the intercept is not. We will 
show in the following theorem that this holds in general. 

Theorem 3. Assume that 

Y\x ~ N{y{x, 7 ), 0 - 2 }, P{R = l\x, y) = F{g{x, a) + /3y}. 

Then the partial derivatives dfi{x,'y)/dx and \dg{x,a)/dx\ are always identihable. 

Going back to the linear model Y\x ~ iV( 7 o + 7 ix, cr^) in Example]^ Theoremsays that even 
if 02 = 0, the coefficients of X in the outcome model, 71 , are always identihable, although the 
intercept 70 may not be identihable. The conclusion of Theorem can be further strengthened if 
g{x, 7 ) and g{x, a) are both linear. 
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Corollary 2. Assume that 


Y\x ~ iV( 7 o + P{R = l\x,y) = F{ao + x'^ai + (3y). 

Assume that Condition [a| fails, i.e., F{z)= c for some 5 > 0 and some c G (0,+oo). 

Define ri = 25{ao + p-fo) + + 21og(c) and T 2 = ai + P^i. Then 

(a) for a general distribntion function F{-), ri 7 ^ 0 or r 2 7 ^ 0 is a sufficient condition for identifi- 
ability of ( 70 , 71 , oQ) “i) and /3; 

(b) if T(-) is the Logistic distribution function, ri 7 ^ 0 or r 2 7 ^: 0 is a necessary and sufficient 
condition for identifiability of ( 70 , 71 , cr^, OQ) “i) and /3. 

The vector T 2 in the corollary above can be interpreted as the “total impact” of covariates X on 
the missingness of the outcome Y. However, the scalar ri does not enjoy an apparent interpretation. 
From Corollary the parameters are not identifiable only in a small subset of the parameter space, 
i.e., Ti = 0 and T 2 = 0. In fact, as shown in the proof of Corollary even if ri = 0 and T 2 = 0, 
there are at most two possible sets of parameters with the same observed distribution, and they 
must satisfy: 7 i = 7i,/3' = —/3, and a'^ = —ai. In practice, we may select one of the parameter 
sets based on our domain knowledge. From these equations, we can see that the absolute values of 
the vector ai are identifiable, implying that the absolute values of the coefficients of the covariates 
on the response probability are identifiable. Moreover, we can see from the above discussion that 
the parameters ri and T 2 are always identifiable. Therefore, we can consistently estimate them and 
test whether they are equal to zero. As a result, we can test the necessary and sufficient condition 
in (b) for the identifiability of the models based on the observed data. 

4. NORMAL AND t MIXTURES WITH NONIGNORABLE MISSING DATA 

In this section, we discuss the identifiability of the normal mixture and t mixture outcomes with 
nonignorable missing data. These mixture distributions are useful to model outcomes with multiple 
modes or heavy tails. We first consider the normal mixture model 

K K 

y ~ ^7rfcA(/ifc,o-|), ^7rfe = l, TTfc > 0, k = l,...,K. (4) 

k=l k=l 
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The model above includes mixtures in both location and scale parameters. If for all k, it 


is a scale mixture; and if = a for all k, it is a location mixture. As in Section 3.1 


we assume 


that the missing mechanism is model 0 ’ where F{-) is a strictly monotone distribution function. 
It satisfies some of Condition discussed before, and Conditions and 0 below. 


Condition B. For any 9q G (— oo, +oo), 9i > 0, 5i, ^2 > 0, and ( 5 i +52 / 0, the limit lim^_^_|_oo{T(0o+ 
9iz) — = 0 or oo. 

Condition C. For any 0o £ (—oo,oo), 6 *i > 0, and M > 0, at most one of lim^i^+oo + 

9iz) — F{z)} and limy_^_oo z^{1 — F{9 q + 9iz)/F{z)} is finite and positive. 

Condition 1^ requires the tail behavior of F{a + I3z) — F{z) be different from any normal or ex¬ 
ponential density. It is straightforward to verify that the commonly-used Probit and Robit missing 
data mechanisms satisfy Condition but the Logistic missing mechanism does not. Condition 0 
holds for all Logistic, Probit and Robit missing mechanisms. 

We have the following result on the identifiability of normal mixture models. 


Theorem 4. Suppose that the outcome model is a normal mixture Q with an unknown K and 
TTfc’s, and the missing mechanism is Q with a known F{-) satisfying Conditions [A| and [B| Then all 
the parameters K, {(vr^, /x*,, cr|) : A: = 1, ..., K}, a and f3 are identifiable. 


Next we discuss identifiability of the t model and its location mixture. A t random variable 


can be represented by an infinite scale mixture of normal random variables (Little and Rubin 


2002; Liu, 2004), and thus the mixture of t random variables can be viewed as an infinite location 


and scale mixture of normal random variables. It can accommodate multimodality and heavy- 
tailedness simultaneously, and we can tune the degrees of freedom and the number of components 
in the mixture distribution in practice. We consider the outcome model 

K K 


Y ~'Y^'KkTu{yk-,o?), ^7rfc = l, vTfc > 0, k = l,...,K, 


( 5 ) 


k=l 


k=l 


where o;^) is a t random variable with location parameter scale parameter o;^, and degrees 

of freedom v. 
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Theorem 5. Suppose that the outcome follows a location mixture of t distributions in Q with 
unknown K, vr^’s and v, and the missing mechanism is Q with a known F{-) satisfying Condition 
Then all the parameters K, {(vTfc, ^k) : k = I,..., K}, v, a and /3 are identifiable. 

As a special case of the t mixture with K = 1, the parameters are also identifiable if the outcome 
model follows a single t distribution. 


5. LATENT MONOTONE MISSING MECHANISM 

In this section, we still assume that the outcome Y comes from a normal mixture distribution with 
K components indexed by parameters mixing proportions vrfc for /c = 1,..., A. Note 

that K and tt^s may be unknown. We further allow the missing mechanism to depend on the 
indicator, G, of the latent components of the mixture model 


P{R = = k) = F{ak + Pky), 


( 6 ) 


where F{-) is a known distribution function, ak and Pk are unknown parameters depending on 
latent groups. Unfortunately, this general missing mechanism is lack of identifiability. Even for 


the case with 13k = 0, i.e., a latent ignorable missing mechanism (Frangakis and Rubin, 1999), the 
model is not identifiable as illustrated in the following example. 

Example 7. Assume that 


F~7riA(0,l) + (l-7ri)A(0,4), P{R = l\y,G = k) = F{ak) (fc = l,2). 

Then any two models with 7riA(ai) = (1 —7ri)E(a2) = 1/4 result in the same observed distribution. 

Therefore, we should impose some restrictions on model Q. As a motivating example, wage 
levels vary among different cities or communities, and in a survey an individual will decide to 
respond or not based on the average wage level of his city or community. In other words, it is the 
wage deviation of an individual from the city’s average level that influences the response probability 
directly. The variance of the wages, a measure of the gap between the rich and the poor people, 
may also influence the response probability. Based on this intuition, we assume the following latent 
monotone missing mechanism, which depends on the mean and variance of the latent group: 

P{R = l\y,G = k) = F 


a'lpicTk) + /3{y - K(/ifc)} 


(7) 
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where k(-) and (p{-) are known functions. We require that k(-) and (p{-) be increasing, and 
'0(-) and </?(•) be positive. The common parameters a and /? represent similar missing data patterns 
in different latent groups. Given the component indicator of the mixture distribution, the missing 
mechanism Q is monotone in the outcome. Averaged over the latent components, however, the 
missing mechanism may not be monotone. 

Example 8. If = 1 k(/u) = 0, we have the monotone missing mechanism 

P(R = \\y,G = k) = F{a + /3y), as discussed in Q and Sections [| and |4) 

Example 9. If ipicr) = a, (p{cr) = a and k{h) = /r, we have 

P{R = l\y, G = k) = F(^a + ■ 

The missing mechanism depends on the deviation from the center of each component {y — fj,k)/crk, 
and the missing proportions are the same for all components. 

Example 10. If = 1, = a and Av(^) = y,, we have 

P{R = l\y,G = k) = F(-+f3^^^^) . 

V J 

The missing mechanism depends on the deviation from the center and the variance of each com¬ 
ponent, and the missing proportions are larger for components with larger variances. 11 yk = y for 
all components, then we have a scale mixture of normal distributions. The resulting marginal out¬ 
come distribution is a t distribution, if as K increases the empirical distribution of ak approaches 
a scaled-inverse-x^ distribution. We can verify that, under the above missing mechanism with 
F{-) = ‘h(-), the model is equivalent to the selection-t model (Marchenko and Genton, 2012). 

Example 11. If = 1, and K{y) = y, we have 

P{R =l\y,G = k) = F (^aak + ■ 

The missing mechanism depends on the deviation from the center and the variance of each compo¬ 
nent, and the missing proportions are smaller for components with larger variances. 

In the above examples, we suggested several candidate missing mechanisms that are easy to 
interpret. Practitioners can choose their own missing mechanisms according to their background 


(Marchenko and Genton, 2012 
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knowledge, as long as the functions {V’(-)) '^(Ol satisfy the conditions above. Below we show 

the identifiability of model Q under the missing mechanism Q. 

Theorem 6. Suppose that the outcome model is a normal mixture Q with unknown K and vr^’s, 
and the missing mechanism is Q. If T(-) is known and satisfies Conditions and then the 
parameters K, {(tt/j, fj,k, cr^) : k = 1,..., K}, a and /? are identifiable. 

The missing mechanism 0 is a special case of the missing mechanism Q as shown in Example 
and therefore Theorem can be viewed as a special case of Theorem Similar to Section 2.2, 
we can incorporate covariates into model Q , and establish identifiability results for the parameters 
conditional on the covariates. 


6. SIMULATION STUDIES 

In this section, we use simulations to evaluate the finite sample properties of the models discussed 
in the previous sections. The data generating models are identifiable according to the theorems 
and corollaries in the previous sections. Eor each model, we simulate 1000 independent data sets 
under sample sizes 500 and 1500, and summarize the results with boxplots. 


6.1 Normal outcomes 

We generate the outcome variable Y ~ iV(0,4), and choose Probit, Logit, and Robit (with ly = 2, 16) 
missing mechanisms. We choose a = 1 and /3 = 1 or 2 for the missing mechanisms. Eor these 
settings, the missing data proportions are between 30% and 40%. Eor each data set, we apply 
various missing mechanisms to estimate the parameters, including Probit, Logistic, and Robit 
models with unknown degrees of freedom. 

We show the simulation results only for /3 = 2 in Eigure and those for /3 = 1 have similar 
patterns and are omitted. Erom Eigure[^ we can see that misspecification of the missing mechanism 
has little influence on estimation of the mean of the outcome, although it has some influence on 
the estimation of the missing mechanism itself. It is because the quantiles of normal, Logistic, and 


t distributions are almost linearly correlated over a large range (Mudholkar and George 1978). 
When the true missing mechanism is Robit with a small degrees of freedom and the Probit model 
is used for estimation, it leads to small biases for estimating /x, but large biases for estimating (3. 
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As the degrees of freedom increase, the biases for both // and /? become smaller, and they can be 
improved by increasing the sample size. 

The Probit model has explicit form of observed likelihoods, but those of the Logistic model 
and the Robit model involve integrals. According to the simulation results, and for computational 
convenience, we recommend using the Probit missing mechanism for estimation of the mean of 
the outcome if we do not care about the missing mechanism itself. Otherwise, we recommend 
conducting sensitivity analysis using the Robit missing mechanism with different degrees of freedom. 


6.2 Scale mixtures of normal and t outcomes 

We use two-component scale mixtures of normal and t distributions with i' = 5 for the outcome 
model. We choose both the Probit and latent Probit missing mechanisms. We generate a covariate 
Xi ~ A^(l, 1), let X = (1, Ai)^, and then generate outcomes from the following models: 


1. Scale mixtures of normal outcome with the Probit missing mechanism 

2 


Y\x ~ ^ 7 rfcA(x^ 7 , al), P{R = l\x, y) = ^ {x^a + /3y) ; 


fc=i 


2 . Scale mixtures of normal outcome with the latent Probit missing mechanism 


Y\x ~ ^ 7 rfcA(x'^ 7 , al), P{R = l\x, y,G = k) = ^{ 

k=l ^ 


x^ a + (3{y — x^ 7 ) 




3. t outcome with the Probit missing mechanism 


^2 

Y\x,a ^ N{x^^,a‘^), ci^ ~ P{R = l\x,y) = ^ [x^a + I3y) 

Xi/ 


4. t outcome with the latent Probit missing mechanism (or, equivalently, the selection-t model) 

x'^a + (3 {y — x'^ 7 ) 


y|x, cr ~ 7, cr^), a^ 


u?v 

V 


P{R = l\x,y,a) = 


a 


( 8 ) 


The true parameters for the above cases 1 and 2 are vri = 7 r 2 = 0 . 5 , 7 = ( 1 , 1 )^, af = l,al = 4 
and a = (1,1)^. The true parameters for the above cases 3 and 4 are = 5, j = (1, — 1 )^, = 1 

and a = (1,1)"^. Further, we set f3 = — 0.5 and —1, which result in about 15 % and 25 % missing 
values respectively. 
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We fit the generated datasets by three models: scale mixture of normal outcomes with correct 
missing mechanisms, normal outcome model with Probit missing mechanism, and the selection-t 
model. We show the simulation results for (3 = —0.5 in Figure and those for j3 = —1 are similar 
and omitted. To save space, we present only the results for the coefficients of Xi and Y. From Figure 
when the missing mechanism is Probit (cases 1 and 3), the estimators of the three methods have 
small biases for outcome model parameter 7 , but the estimators of the normal outcome model and 
the selection-t model have large biases for missing mechanism parameters {a, 13). This is because 
the former model cannot accommodate heavy-tailedness of the outcome distribution and the latter 
is not a monotone missing mechanism. Using a scale mixture of normal outcome models, biases 
of both outcome model parameter 7 and missing mechanism parameters {a, (3) become smaller as 
sample size increases (from 500 to 1500), but they become even larger using the other two models. 
When the missing mechanism is a latent Probit model (cases 2 and 4), the selection-t models are 
the true specifications for case 4. Thus they have smallest biases for 7 , a and f3 overall. But a 
scale mixture of normal models also works very well for case 4, and their biases for ( 7 , a, /?) are 
close to those of the selection-t model. However, for case 2, the selection-t models have large 
biases for missing mechanism parameters (a, /?), and they do not improve as sample size increases. 
Among these estimation methods, the normal outcome model is the worst. So we recommend a 
scale mixture of normal models for heavy-tailed outcomes because they enjoy robustness and easy 
interpretations. 

6.3 General mixtures of normal and location mixtures of t outcomes 

We generate data from two-component general mixtures of normal (mixture in both location and 
scale) outcome models with both the Probit and latent Probit missing mechanisms; and two- 
component location mixture of t outcome models with the Probit missing mechanisms. We first 
generate a covariate Xi ~ ^"(1,1), let X = (1, Ai)^, and then generate the outcome according to 
the following models: 

1. Normal mixture outcome with the Probit missing mechanism 

2 

Y\x ~ ^ 7TkN{x'^jk,<rl), P{R = l|x, y) = {x^a -h (3y) , 

k=l 
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with true parameters tti = 7 r 2 = 0.5 ,71 = (1,1)^,0 = (1,-1)^,uf = 1 ,(T 2 

( 1 , 1 )^. 


= 4 and a = 


2. Normal mixture outcome with the latent Probit missing mechanism 

Y\x ~ '^TTkN{x'^'yk,al), P{R = l\x,y,G = k) = ^ a + x jk) ] ^ 

k=l I ) 


with true parameters tti = 7 r 2 = 0.5 ,71 = (2,1)'^,0 = (—1.5,—2)"^, erf = l,f 7 f = 4 and 
a = (1.5,-2)^. 

3. Location mixtures of t outcome with the Probit missing mechanism 
2 2 

F|x,(t ~ ^7rfcA^(x'^7fc,cr^), ~ P{R = l\x,y) = <^ {x'^a + Py) , 

k=i 

with true parameters z/ = 5, tti = 7 r 2 = 0.5 ,71 = (1,3)^, Q = (1, —3)"^, = 1 and a = (1,1)^. 


We choose jS = —0.5 and —1 for missing mechanisms. Under these settings, the missing data 
proportions are between 20% and 50%. Note that these three simulations are different from those 
in the previous subsection because the outcome models here allow for location mixtures, which 
usually lead to multimodality and have more parameters. 

We fit the datasets by a two-component general mixture of normal outcome models with the 
Probit missing mechanism for cases 1 and 3, and a latent Probit missing mechanism for case 2. 
We also fit the datasets by the normal outcome with the Probit missing mechanism. Because the 
normal outcome model returns only one set of location parameters, we focus only on its estimators 
of missing mechanisms. 

Figure shows the simulation results for /3 = —0.5, and those for /3 = — 1 are similar and 
omitted. To save space, we present only the results for the coefficients of Xi and Y. Using 
normal mixture models for estimation, the biases for parameters ( 71 , 0 ,a,/?) are small, and they 
become smaller as sample size increases. But the estimators by normal outcome model are very 
biased. Therefore we recommend using a general mixture of normal models for outcomes with 
multimodality. 
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7. APPLICATION 


We apply our models to the ambulatory expenditure data previously analyzed by [Cameron and 


Trivedij (2009). The dataset contains the log of ambulatory expenditure (ambexp) and covariates 


X = (l,age, female, educ, blhisp, totchr, ins, income), including age, gender, education 
status, ethnicity, number of chronic diseases, insurance status, and income. The dataset contains 
3328 observations and there are 526 missing values of ambexp. More details about the data can be 


found in Chapter 16 of Cameron and Trivedi (2009). The authors applied the Heckman Selection 
Model to the data, and they found no significant selection effect. 

However, we suspect that the samples do not exactly follow the normal model. We apply the 
general normal mixture model to the data and compare the results with those of the Heckman 
Selection Model and the selection-t model. We present the results in Table We start with two 
components and observe that the standard deviations of the two components, ui and (72, are very 
different. The estimate of the ratio is 0.610 with 95% confidence interval (0.500,0.716) ex¬ 

cluding one. It provides strong empirical evidence of the existence of two latent normal components 
rather than a single one. We continue to fit a general normal mixture model with three compo¬ 
nents, and find that one component has estimated proportion 0.041. We not only find that the 
proportion of the third component is small, but also find that the regression coefficients of the third 
component are similar to one component. Therefore, we choose a general normal mixture model 
with two components, and omit analysis with more components. 

The general normal mixture model with the Probit missing mechanism yields interesting prac¬ 
tical interpretations. In our analysis, we first fit a general normal mixture model with different 
coefficients on all the covariates for the two components, and find that all regression coefficients are 
the same except for ins. Therefore, in the last two columns of Table we choose to present the 
results under a more parsimonious model allowing for only different regression coefficients of ins. 
The covariate ins has a positive although insignificant effect on log(ambexp) in a large group of the 
people (88%) with an estimate 0.009 and 95% confidence interval (—0.095,0.174). However, ins 
has a significant negative effect in a small group (12%) with an estimate —0.595 and 95% confidence 
interval (—1.705, —0.152). The results provide further evidence of two different components. The 
Heckman Selection Model or the selection-t model cannot detect such latent heterogeneity, because 
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they are not able to accommodate location mixture or multimodality. As a result, these two models 
mix the two groups, and consequently report no significant impact of ins on log(ambexp). The two 
components in the normal mixture model may reflect different health statuses. For healthy people 
who rarely needed health care, the insurance had an insignificant effect on ambulatory expenditure; 
for sick people who needed more health care, the insurance was beneficial for reducing ambulatory 
expenditure. The two components might also reflect different types of health insurance. Sick peo¬ 
ple tended to buy insurance that had larger coverage proportions, and consequently the insurance 
helped reduce more ambulatory expenditure. 

Previous analyses found evidence of heavy-tailedness in the outcome distribution. However, 
the upper confidence limit of v is the selection-t model is near 20 (Marchenko and Genton, 2012 


Ding 2014), and such heavy-tailedness seems to be captured also by the normal mixture model as 


illustrated in the simulation studies in Sections 6.2 and 6.3 Furthermore, the Heckman Selection 
Model finds insignificant selection effect, but both the selection-t and the normal mixture model 
with the Probit missing mechanism find significant selection effect. 


8. DISCUSSION 

Dealing with missing data is crucial for many applied problems, which are often challenging if the 
missing mechanisms are nonignorable. Even if we have a fully parametric model on the missing 
mechanism, nonparametric outcome models subject to nonignorable missing data are not identifi¬ 
able. In this paper, we have demonstrated the identifiability of normal and normal mixture models 
with nonignorable missing data. The identifiability results do not require an instrumental variable 
for the missing mechanism, or the exclusion restriction assumption in the Heckman Selection Model 
( ]Wooldridge 2010), i.e., there exists a covariate that is only in the selection equation but not in 
the outcome equation. Although the selection-t model could accommodate heavy-tailedness of the 
data, it could not model outcome with multimodality or latent groups. Our t mixture model with 
a nonignorable missing mechanism filled in the gap by allowing for modeling heavy-tailedness and 
multimodality simultaneously. 

There are a few questions beyond the current scope of this paper. First, determining the number 
of components in the mixture models (Lo et al. 2001| Chen et ah, 2012) is an important problem 
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in practice. Second, although the t mixture model with the Robit missing mechanism, as discussed 

in Theorem is useful for modeling heavy-tailed data, its maximum likelihood estimates involve 

numerical integrations. We are going to develop an efficient Bayesian inferential procedure for it. 

SUPPLEMENTARY MATERIALS 

Supplementary Materials contain the proofs of the theorems and corollaries. 
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Table 1: The ambulatory expenditure study example. 




Selection 


Selection-t 

General normal mixture 

Outcome Model 






age 

0.212 

(0.167, 0.257) 

0.207 

(0.163, 0.251) 

0.209 

(0.165, 0.253) 

female 

0.348 

(0.230, 0.466) 

0.307 

(0.196, 0.417) 

0.327 

(0.213, 0.431) 

educ 

0.019 

(-0.002, 0.039) 

0.017 

(-0.003, 0.037) 

0.020 

(0.000, 0.040) 

blhisp 

-0.219 

(-0.336, -0.102) 

-0.193 

(-0.306, -0.080) 

-0.211 

(-0.326, -0.094) 

totchr 

0.540 

(0.463, 0.617) 

0.513 

(0.443, 0.583) 

0.529 

(0.460, 0.588) 

ins 

-0.030 

(-0.130, 0.070) 

-0.053 

(-0.151, 0.046) 

0.009 

(-0.095, 0.174) 






-0.595 

(-1.705, -0.152) 

a 

1.271 

(1.236, 1.308) 

1.195 

(1.146, 1.246) 

1.159 

(0.958, 1.227) 






1.900 

(1.423, 2.383) 

Selection Model 






income 

0.003 

(0.000, 0.005) 

0.003 

(0.000, 0.006) 

0.003 

(0.000, 0.006) 

age 

0.088 

(0.034, 0.142) 

0.099 

(0.040, 0.157) 

0.127 

(0.060, 0.208) 

female 

0.663 

(0.543, 0.782) 

0.725 

(0.591, 0.859) 

0.734 

(0.609, 0.911) 

educ 

0.062 

(0.038, 0.086) 

0.065 

(0.040, 0.090) 

0.066 

(0.045, 0.091) 

blhisp 

-0.364 

(-0.485, -0.243) 

-0.394 

(-0.524, -0.263) 

-0.407 

(-0.548, -0.291) 

totchr 

0.797 

(0.658, 0.936) 

0.890 

(0.719, 1.061) 

0.906 

(0.750, 1.137) 

ins 

0.170 

(0.047, 0.293) 

0.180 

(0.048, 0.313) 

0.166 

(0.047, 0.289) 

P 

-0.131 

(-0.401; 0.161) 

-0.322 

(-0.526, -0.083) 








-0.174 

(-0.414, -0.025) 

TTl 


1 


1 

0.880 

(0.401, 0.971) 

log A 


-5836.219 


-5822.076 


-5820.457 

AIC 


11706.44 


11676.15 


11680.91 


Note: Point estimates and 95% confidence intervals of the Heckman Selection Model (columns 2 and 3), 
the selection-t model (columns 4 and 5), and the general mixtures of normal model (columns 6 and 7). The 
p parameter is only for the Heckman selection model and selection-t model, and the jSy parameter is only 
for the general mixture of normal model. The tti parameter is the proportion of the first latent component, 
log L is the log likelihood of the observed data, and AIC is the Akaike Information Criterion. 
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PLR PLR PLR PLR 


(a) Generated from the Probit missing mechanism (b) Generated from the Logistic missing mechanism 



PLR PLR PLR PLR 


(c) Generated from the Robit missing mechanism {u = 2) (d) Generated from the Robit missing mechanism (v — 16) 

Figure 1: Estimates when (3 = 2. Data are generated from normal outcome models with different 
missing mechanisms, and analyzed under normal outcome models with different missing mecha¬ 
nisms; Probit (P), Logistic (L) and Robit (R) with unknown degrees of freedom. In each boxplot, 
white boxes are for sample size 500 and gray boxes for 1500. The horizontal lines illustrate the true 
values of the parameters. 
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(a) Generated from scale mixture of normal outcome with the Probit missing mechanism 



(b) Generated from scale mixture of normal outcome with the latent Probit missing mechanism 
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(d) Generated from outcome with the latent Probit missing mechanism 


Figure 2; Estimates when /3 = —0.5. Data are generated from different outcome models, and 
analyzed under different outcome models with correct missing mechanisms: two-component scale 
mixture of normal outcome (SMN), normal outcome (N), and the selection-t model (SLT). The 
estimators of SLT are re-parametrized to be equivalent to the scale mixture of normals with the 
latent Probit missing mechanism as shown in ([^. In each boxplot, white boxes are for sample size 
500 and gray boxes for 1500. The horizontal lines illustrate the true values of the parameters. 
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GMN GMN GMN N GMN N 


(a) Generated from general mixture of normal outcome with the Probit missing mechanism 



(b) Generated from general mixture of normal outcome with the latent Probit missing mechanism 


Ti 72 a B 



GMN GMN GMN N GMN N 


(c) Generated from ts mixture outcome with the Probit missing mechanism 

Figure 3; Estimates when /3 = —0.5. Data are generated from different outcome models, and 
analyzed under different outcome models with correct missing mechanisms: two-component general 
mixture of normal outcome (GMN), and normal outcome (N). In each boxplot, white boxes are 
for sample size 500 and gray boxes for 1500. The horizontal lines illustrate the true values of the 
parameters. 
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Supplementary Materials for “Identifiability of Normal and 
Normal Mixture Models With Nonignorable Missing Data” 


To prove the theorems and corollaries in this paper, we need several lemmas. 

Lemma A.l. If o" 7^ a', then for any (a, f 3 ,a', P') and distribution functions T’i(-) and F2{-), at 
least one of the following two statements holds: 

f = +00 0 

^ +°° (p ( ) F2[a'+ p'y) 


lim 

y^—oo 


P{y-^) Fi{a + Py) 


/ . o, \ = +00 or 0 for any n, /x'. 

F 2 {a' + P'y) 


Proof. We first simplify the expression as 


Fi{a + Py) 




^^ 2 ^^ F2{a' + P'y) 


F,{a + Py) 


2o-2cj'2 


■ + 


aV'2 


+ ■ 


2cj2ct'2 


F2{a' + P'y) ’ 
(A.l) 


and then discuss its limit for the following four cases. 


(a) /3 > 0 ,/ 3 ' > 0 . Because limy_5.+oo Ti(q: + Py)/F2{a' + P'y) is finite and positive, as y —>■ +00 


the limit of expression (A.l) is +00 if cr > cr' and 0 if cr < ex', for any y, y!. 


(b) /3 < O,; 0 ' < 0 . Because limy^_oo ^1(0 + Py)/F2{a' + P'y) = 1 , as y —)■ —00 the limit of 


expression (A.l) is +00 if a > a' and 0 if cr < cr', for any y, y'. 


(c) /3 > 0 , /?' < 0 . Because lim^^+oo Fi{a + Py) is finite and positive and limy_>._oo F2{a' + P'y) = 


1 , as y —)• +00 the limit of expression (A.l) is +00 for a > a', and as y —)• — 00 its limit is 0 
for a < u', for any y, y'. 

(d) P < 0 , P' > 0 . Because limj^_^_oo Ti(a + Py) = 1 and limy_>+oo ^2(0' + / 3 'y) is finite and 
positive, as y —)• —00 the limit of expression ( |A.l ) is +00 for a > a', and as y —)■ +00 its limit 
is 0 for (T < ex', for any y, y'. 


Therefore, at least one of the two statements holds. 


□ 
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Lemma A.2. If /3 > 0, /3^ > 0, then for any // 7 ^ and any distribution functions -Fi(-) and F 2 {-), 
we have 

</.(^) F,ia + /3y) ^ , 

hm — 7 —w • „ , . -77^^ = 0 or + 00 . 


y^+00 ^ F 2 {a' + P'y) 

If /3 < 0, /3' < 0, we have the same result as y ^ — 00 . 


Proof. We simplify the expression as 


</.(^) Fi{a + f3y) 


^ ^ y-F ^ F2{a' + fi'y) ® fj 2 ' 20"^ J F2(a' + / 3 'y) 

If /3 > 0 and (3' > 0, limj^_^+oo Fi{a + I3y)/F2{a' + fi'y) is hnite and positive. As y —)• + 00 , if /i > y', 


{y-y')y , 1 Fi{a + Py) 


+ 


(A.2) 


the limit of expression (A.2) is + 00 ; if /x < y!, its limit is 0. If /3 < 0 and /?' < 0, we let /? = —/3, 
P' = —fi', y = —y, y = —y and y! = —y'. We have the same result as 7 / —>■ — 00 , or equivalently 
y —)■ + 00 , by the same argument. □ 


Lemma A.3. If lim 2 ^._oo Fi{z)le^^ = 0 or +00 for any <5 > 0 (i = 1, 2), then for any {a, /3, a', fi', a) 
at least one of the following statements holds: 

0 ( 29 ^) F,{a + /3y) 


lim 

y ^+00 


lim 


y^-°° A ( y-y‘ 


/ "" ■ jp ^ , o, \ = 0 or + CX) for any y ^ y'; 

F2[a' + /3'y) 

(f[^) Fi{a + f3y) ^ ^ ^ / 

A ' R_{a' + I 3 'y) = ° + oo for any ^ # M ■ 


Proof. We first simplify the expression as 


Fi{a + liy) 


= exp 


{fJ--IJ-')y , Fi{a + (iy) 


F2ia' + (i'y) 

and then discuss its limit for the following four cases. 


+ 


2 ct 2 


F 2 {a' + (i'y)' 


(A.3) 


(a) /? > 0, > 0. We have proved this case in Lemma A.2 


(b) /3 < 0, /3' < 0. Letting ^ = —/?, ji' = —fi', y = —y, y = —y and y' = —y', we have /3 > 0, 
P' > 0 and 

Fi{a + fiy) Fi{a + Py) 


F2ia' + P'y) F2{a' + p'y)' 


By Lemma A.2, its limit is either 0 or +00 as y —>■ + 00 , or equivalently y —>■ — 00 , for any 
y / y'. 
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(c) > 0, /?' < 0. First, we discuss the case with /r > fj.'. We have limy_^+oo ^ 2 ( 0 ;' + f3'y) = 0, 

limy^.+oo Fi(q; + /3y) is finite and positive, and limy_>.+oo exp {(/U — = + 00 . There¬ 

fore, the limit of ( |A.3 ) is + 00 . Second, we discuss the case with n < y!. As y —)■ + 00 , 
z = a' + I3'y —)■ — 00 , because hm^^_oo F 2 {z)je^^ = 0 or -|- 00 with 5 = {y — y')/{a'^l3') > 0, 


the limit of (A.3) is 0 or + 00 . 


(d) /3 < 0, /3^ > 0. Exchange numerator and denominator in (A.3), and we can follow (c) to prove 
this case. 


□ 


Lemma A.4. For any distribution function E(-), the limit 

1 


lim ————— •- 

y^+oa ^ F{a + P'y) 


is finite and positive, if and only if 
(a) P' >0 and y' = y; or 


(b) /?' < 0, > y, and lim^^-oo E( 2 ;)/e'^^ = c G (0,-boo), where 6 = {y — y')/{a^P'), and 


c = exp{(/r'^ — y‘^)j{2a‘^) — 6a}. 
Proof. We have 


lim 




1 


y'^ - y^ 


lim 


exp 


f {y-F)y \ 


y ^+00 ^ ^ y-y j F(a + 13'y) ® ^ | 2a'^ j y ^+00 F{a + /3'y) 

For the case with (3' > 0, the limit is hnite and positive if and only if y' = y. For the case with 
j3' < 0, the limit is hnite and positive if and only if lim 2 _^_oo F{z)je^^ = c G (0, -boo), where 


X ^ n / \ . 

6 = > 0, c = exp ^- 6a], z = a + py^-oo. 


□ 


Lemma A.5. For any positive integer K = parameters {{af,yij) : i = I,..., F, j = 

1,... ,Ji}, a and ( 3 , if < af and yyj+i) < yij, then the functions 

’aV’(o'fc) + I3{y - K{yij)y 


Qij{y) = 4> I -—— 1 F 


(Ti 




: i — 1J — 1 ,..., Jj 


are linearly nncorrelated. 
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Proof. Suppose there were real numbers {pij : i = 1,..., j = 1,..., Ji} such that 


1 Ji 
i=l j=l 


(A.4) 


for any y. We will discuss Equation (A.4) for the following two cases. 


(a) > 0. Dividing Equation ( |A.4D by (/>{(y - yiii)/ai}/ai, we have Si{y) + S 2 {y) + Sz{y) = 0, 

for any y, where 


Si{y) =PiiF 


ai^iai) + I5{y - k{pu)} 


_ ^ Pijcri j (pij - pn)y Pn - Pij 


S2(y} =^^^exp 
%(!/)= EE * 5 ^exp 

(Ji 


(Ji 


2al 


aifiai) + /3{y - Kjpij)} 


i=2 j=l 


f {y- pii)^ 

(y - 1 

\f 

'af^iai) + /3{y - ^{pij)}' 

1 2cTf 

J 

r 



correspond to the first term, the sum of the second to the Jith terms with variance erf, 
and the sum of the remaining terms of Equation (A.4), respectively. We can verify that 
lim^^+oo *S' 2 (y) = 0 and limj^_^+oo S' 3 (y) = 0. Therefore we must have limy^+oo 5'i(y) = 0. 
Because /3 > 0 implies 

+ /3{y - k{pii)} 


lim F 

jz-r+oo 




> 0 , 


we have pu = 0. By the same argument, we can prove that pij = 0 for all i,j. Therefore 
Qij{y) are linearly uncorrelated. 

(b) ,0 < 0. We let /3 = -/3, y = -y, pij = -pij, R{pij) = and 

Q.Av) = Q,(y) = <P (^) , 

Because P > 0, {Qij{y) ■ i = 1,..., /; j = 1,.. •, Ji} are linearly uncorrelated according the 
discussion in the case (a). So {Qij{y) : i = 1,..., /; j = 1, ..., Ji} are linearly uncorrelated. 

□ 

Proof of Theorem [7| We use a proof by contradiction to show that the parameters can be identihed 
by the observed distribution P{y, R = 1). Suppose that there were two sets of parameters satisfying 
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the same observed distribution: 

• -^(« + fiy) = ^ ) • ^(«' + P’y)- ( a .5) 

a \ a J a \ a J 

Below we show the results (a), (b) and (c) of the theorem one by one. 


(a) Equation (A.5) implies 


. n^+^y) = ^ g (0 +oo) 

F{a' + P'y) 


(A. 6 ) 


Lemma A.l implies that a 7 ^ u' is impossible since as y —?• ±00 the limit of (A. 6 ) is either 0 


or + 00 . Thus we must have a = a'. To prove \ j3\ = |/3'|, we discuss the following four cases. 


(a.l) /? > 0, /3' > 0. We must have // = //', because otherwise as y —+ 00 , the limit of (A. 6 ) is 


either 0 or +00 by Lemma A.2 Therefore, F{a + /3y) = F{a' + (5'y) for any y. Because 
F(-) is strictly monotone, we must have a = a' and /3 = /3h 

(a.2) /3 > 0, /?' < 0. Because limj^_,._|_oo F{a + /3y) is finite and positive, we have that 

</>(^) 1 
lim ^ ^ - 


y ^+00 ^ ^ y-F '^ F{a + I3'y) 


is finite and positive. According to Lemma A.4, we must have lim 2 _,._oo F[z)le^^ = c G 


(0, + 00 ) with 6 = {y, — y!)/{a‘^l3'). Let y —>• — 00 , and a similar application of Lemma 
A.4 to (A. 6 ) gives us 5 = {y! — y)/{a‘^l3). Therefore, we have /3 = —/3'. 


(a.3) /3 < 0, /3' < 0. The discussion is similar to (a.l) by letting y —>■ — 00 . 
(a.4) /3 < 0,/3' > 0. The discussion is similar to (a.2). 


(b) From (a), cr^ and |/3| are identifiable. When the sign of /3 is known, we need only to consider 
the cases (a.l) or (a.3). We have proved identifiability of all the parameters in (a.l) and (a.3). 

(c) We will prove that under the Condition [A| the cases (a.2) and (a.4) are impossible. For case 
(a.2) with /3 > 0,/?' < 0, as y —)■ + 00 , the limit of F{a + /3y) is finite and positive, and 
therefore the limit of 

4 >{^) 1 

F(a + /3'y) 
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is also finite and positive. Due to Lemma A.4 and /?' < 0, we must have lim^^_oo F{z)le^^ = 
c G (0,+oo) for some <5 > 0, which is contradictory to Condition]^ Similarly, the case (a.4) 
is also impossible. For cases (a.l) and (a.3) we have already established the identifiability of 
the parameters. 

□ 

Proof of Corollary^ Suppose that two sets of parameters {a, v, a, P) and {a', /r', u', a', f3') satisfy 
the same observed distribution: 

-4> (- —•T^(a + /3y) =^4'( ^ • r^/(a'+/3'y). 

a \ a / a \ a / 


Similar to the proof of Theorem [^a), we can show a = a' according to Lemma A.l For any i/, 
a t random variable with degrees of freedom v satisfies tv{z) oc (1 + , and thus Ty{z) 

satisfies Condition [A| in Theorem by L’Hospital’s rule. Therefore, cases (a.2) and (a.4) in the 
proof of Theorem are impossible according to the proof of Theorem [^c) . Below we discuss cases 
(a.l) and (a.3). 

For case (a.l) with (3 > 0, (5' > 0, we can show fi = fi' similar to the proof of Theorem 


according to Lemma A. 2 Because (/r, a'^) are identifiable, we have T^{a + f3y) = T^'{a' + P'y) for 
any y. Under the condition /? 7 ^ 0 required by Corollary we can show that, as y —>■ — 00 , the 
limit of Ty{a + I3y)/T^i{a' + j3'y) is 0 if ,0' = 0. Therefore (3' > 0. Then as y ^ — 00 , the limit of 
Ty{a + f3y)/T^i{a' + I3'y) is +00 if < v', and 0 if i/ > v'. Therefore v = v', and thus a = a',f3 = (3'. 
The discussion for case (a.3) is similar to case (a.l). □ 

Proof of Theorem^ Suppose that there were two sets of parameters satisfying the same observed 
distribution: 


1 


y - y{x,i) 


F{g{x,a) + Py} = 


1 


V - Kx,!') 


F{g{x,a) + /3'y}. (A.7) 


a{x,6) \ a{x,9) j ’ a(x,0^) [ a{x,9') 

Replacing y, cr^ and a in the proof of Theorem with fi{x, 7 ), <t^(x, 9) and g{x, a) and conditional 
on X = X, the identihability of /i(x, 7 ), a‘^{x,9), g{x,a) and /3 is the same as that of /r, a 
and 13 in Theorem]^ As x varies, we obtain identifiability of functions ^(-, 7 ), a‘^{-,9) and g{-,a). 
Because there are one-to-one mappings between the parameters ( 7 , 9, a) and these functions, the 
parameters are identifiable. Therefore, the results (a), (b) and (c) of Theorem]^ hold. 
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Now we will prove the result (d). Similar to the proof of Theorem we can show the identi- 
fiability of the parameters for cases (a.l) /3 > 0,/3' > 0 and (a.3) /3 < 0,/?' < 0. We need only to 
show that (a.2) /? > 0, /?' < 0 and (a.4) /3 < 0, /?' > 0 are impossible under the condition in (d) 
of Theorem]^ Without loss of generality, we show case (a.2), and the discussion for case (a.4) is 
analogous. 

Note that 9 is identifiable and /?' = —/3 by the result (a). For case (a.2) with P > 0 and f3' < 0, 


Equation (A.7) implies that 


lim 

y^zLoo 


(f)< 

(\ 

1 o-(x,6») J 


' y-fi{x,^') \ 

<j{x,6) j 


F{g{x,a) + /3y} 


= 1 . 


(A. 8 ) 


We let y —)• +oo and apply Lemma A.4 we then let y —)• —oo and apply Lemma A.4 again 
Consequently, we must have two sets of conditions, i.e., lim 2 _,._oo F[z)le^^ = c with 

/i(x,7) - ^(x,7') ^(x,7) -/x(x,7') 


6 = 


cr^(x, 0)(5 


-= 


a‘^{x,9)j3' ’ 

/i(x,7')^ -/i(x,7)2 


2cr(x, 0)2 


- Sg{x,a ) 


(A.9) 

(A.IO) 


Equation (A.IO) holds for any x and therefore holds for a particular xq. Taking the difference 


between Equation (A.IO) for x and xq, we have that 

/i2(x,7') - /i2(x,7) /i2(xo,7') - 


-5{g{x,a) - g{xo,a)} = 
5{g{x,a ) - g{xo,Q')} = 


2cr2(x,0) 2cr2(xo,0) 

/i2(x,V) -/i2(x,7) _ i?{xQ,i) - y?{xQ,-i) 

2 cr 2 (x, 9) 


2 o- 2 (xo, 0 ) 

From the first identity of (A.9), we have y{x,'y') = /r(x, 7 ) + 6(3a‘^{x,9). Plugging it into the 
above two equations, we have 


- g(x,a) +g(xo,Oi) 

Jfl2 

= /3{g(x, 7 ) - /i(xo, 7)} + 0) - o-^(xo, 0)}, 

(A.ll) 

"iS 

0 

1 

x/32 

= /^{m(x, 7) - m(xo,7)} + -^W^(x, 0) - (x^(xo, 9)}. 

(A.12) 


Either one of (A.ll) and (A.12) conflicts with the condition that a-g{x, j) + b-a^{x, 9) +g{x, a) ^ c 


for any nonzero vector {a,b,c) and for any ( 7 , 0 , a). Thus (a.2) is impossible. 


□ 


Proof of Theorem^ Theorem]^ is a special case of Theoremwith a{x,9) = a being a constant. 
The hrst identity of Equation ( A.9| ) implies that for any x and a particular xq, 

y{x, 71 ) - g{x, 7 () _ g{xo, 71 ) - //(xq, 7 !) 


C72/3 


cj 2/3 
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Therefore, we have 71 )—//(xq, 7 i) = 7 ^)—^(xq, 7 i), implying (9//(x, 'yi)/dx = dfj,{x, 'y[)/dx. 

Because a{x,9) = cr is a constant, Equations (A.11) and ( A.12[ ) imply g{x,a') — g{xo,a') = 
—g{x,a) + g{xo,a), and therefore, dg{x,a')/dx = —dg{x, a)/dx. □ 

Proof of Corollary\^ The models of Corollary are special cases of Theorem with a{x,6) = a 
being a constant, /j.{x, 7 ) = 70 + and g{x, a) = ao + x'^ai. Below we show the results (a) and 
(b) one by one. 

(a) Similar to the proof of Theorem]^ we need only to show that cases (a.2) /? > 0,/3' < 0 and 
(a.4) /3 < 0,/3' > 0 are impossible. Without loss of generality, we discuss only case (a.2). 
If there were two different sets of parameters satisfying the same observed distribution, we 


obtain from Equation (A. 10) that 


7o = 70 + Sa'^13, a'^ = -ao - 21og(c)/<5, 7 ^ = 71 , /?' = -/3, a'l = -ai, (A.13) 

ai + / 37 i = 0, 25ao + ‘^5/3 jo + + 2 log(c) = 0. (A. 14) 

The above equations imply that ti = 0 and T 2 = 0, which conflicts with the condition that 
Ti 7 ^ 0 or T 2 7 ^ 0. So the case (a.2) is impossible. The proof of (a.4) is similar. Therefore, 
Ti 7 ^ 0 or r 2 7 ^ 0 is a sufficient condition for identifiability of the parameters. 

(b) Eor the Logistic missing mechanism, we can further prove the necessity of the condition ri 7 ^ 0 
or T 2 7 ^ 0. If Ti = 0 and T 2 = 0, we can verify the following equation: 


exp 


(y - 70 - a;7i)^ 1 exp(Q:o + xai + ,5y) 


2c72 


1 + exp(Q;o + xai + (5y) 


= exp 


{y - lo - \ exp(a'o + xa'i +/3'y) 


2(t2 


1 + exp(aQ + xa[ + /3'y) ’ 


i.e., the two sets of parameters satisfying Equations (A.13) and (A.14) must also have the 


same observed distribution if ri = 0 and r 2 = 0. Therefore ri 7 ^ 0 or r 2 7 ^ 0 is also a necessary 
condition for identifiability of the parameters. 


□ 


Proof of Theorem^ ami[^ Theorem]^ is the special case of Theoremwith ip{-) = 1, k{-) = 0 and 
ip{-) = 1. Therefore, we prove only Theorem]^ 













For notational convenience, we use double indices {i,j) for k such that {af : i = 1,..., 1} are 
sorted in a decreasing order, and then {/Hij : j = 1 ,..., Jj; Ji = K} are sorted in a decreasing 

order for each i. We first dehne: 


Qij{y) = 


'^ij 1 ( y k'ij 


Oi 


F 


ai/jiai) + P{y - KjiJ-ij)} 


CTi 

I Ji I Ji 

^i(y) = Qij{y)/Qii{y), ^ 2 ( 2 /) = Qij{y)/Qiji{y)- 

2=1 j = l 2=1 j = l 

Suppose that there were another set of parameters having the same observed distribution, and we 
use Qijiu), h'i{y) and h' 2 {y) to denote the functions under this set of parameters. We have 

2=1 j = l 2=1 j = l 

By the definitions of hi{y), h 2 {y), h\{y) and h 2 {y), we have 


(A.15) 


<3ii(y) hi{y) 


= 1 


QiJliy) h2{y) 


Q'niy) Kiy) ’ Q'lj^iy) Kiv) 


= 1 . 


(A. 16 ) 


We can re-express hi{y) as 
Ji 


hi{y) = ^exp 

^ VTll 


TTlj 

i=2 
I J^ 

+EE 

i=2 j=l 


2 2 

{yij - yii)y ^ - k-ij 


F 


2af 


2a‘f 


F 


ai>{ai)+fS{y-K{tiij)} 


Vi^i) 


TTjjfCJl 

TTllCTj 


exp 


{y-y-ijf , (y-wi) 


2', F 


2af 


+ 


2a‘l 


a'ip{a-i)+y{y-KifJ.ij)} 

‘P(o'i) 


ai>{cri)+l3{y-K{tJ.ii)} 




Note that </?(•) and k(-) are increasing functions, and therefore the limit 


p 

«bfoi)+/3{y-K(Mo)} 


v{^i) 

F 

atp{ai)+li{y-K,{yLi-i)} 



lim 

y ^+00 p 

must be finite and positive regardless of the sign of (3. The exponential terms in hi (y) converge to 
0 as y -hoo, so we have liiUy^+oo hi{y) = 1. Similarly, liiUy^+oo h[{y) = 1, limj^^_oo ^ 2 ( 2 /) = 1, 
and limy_,._oo ^ 2 ( 2 /) = 1- From Equation ( |A.16 ), we have that 

F 


, QrM , 

lim ——= lim 


ai){cTi)+y{y-K{y.i_i)} 


y^+ooQ'^^{y) 


F 


a''i/>(o-')+/3'{y-K(^'i)} 


= 1 , 






y^-00 Q' (y) y^-00 

TT\j,(Ti(l) 







7? 

' OLil)[(T^)+y{y-K{y.xj^)}' 




F 


V’K) 



= 1 . 


(A.17) 

(A.18) 
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If fji 7 ^ (T^, Lemma A.l implies that the limits in (A.17) and (A.18) must be 0 or +cx), which 


contradict with the limits above. Therefore, we must have ui = a[. We divide the remaining 
discussion into the following three cases. 


(a) /3 > 0. We first observe that 


lim F 

y—>-+oo 


aip(ai) + /3{y - K(yn)} 


is finite and positive. We apply Lemma A.4 to Equation (A.17) and obtain /rn = and 
13' > 0, because Condition [A| rules out the second case in Lemma A.4 We further demonstrate 


that /3' = 0 is also impossible. Otherwise, Equation (A.18) and Lemma A.4 with y —)• —oo 


will conflict with Condition With cji = /in = /i'j^^, /3 > 0 and /?' > 0, we must have 


TTii = 'k'ii by Equation (A. 17). 

Let 00 = - /3«;(/iii), 0i = /3/(/?((Ti), 9'q = a''4}{ai)/- ^'^(/iii), and 9'^ = 


/3'/(^(cJi). We plus both sides of Equation (A.15) by —7rii/(Ti-i?!){(y—/iii)/cri}/ii(//)E(0g+0'^y), 
and obtain that 


ZEll/A ( 


{hi{y)F{9o + 9iy) - hi{y)F{9'^ + 9'^y)] 


cri 


'EXLrk ( 




{h;(y)F(0' + 9'^y) - hi{y)F{9'^ + 9'^y)} 


= L 


and thus 


lim 

y^+oo 


F{9o + 9iy)-F{9'^ + 9'^y) hi{y) 


= 1 . 


(A.19) 


h'^{y)-hi{y) F{9'^ + 9'^y) 

The second fraction of the above equation converges to one. By the definitions of hi{y) and 
h'i{y), there exist (5i,<52 > 0 satisfying di + (52 > 0 such that 

K{y) - hi{y) = y + 00 . 


This leads to a contradiction with Condition [B| implying that Equation (A.19) is impossible. 
So we must have 13 = 13' and a = a'. Equation (|A.15 ) implies that 


{Qijiy) = = {Q'ijiy) = .J';j = 1 ,..., J'} 


are linearly correlated, and by Lemma A.5 we must have that Ui = (t', yij = //C, I = F and 
Ji = Jl for all i,j. 
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(b) /3 = 0. Similar to the argument of case /3 > 0, we first apply Lemma A.4 to obtain 


and (3' > 0. We then rule out the case /3' > 0, otherwise Equation (A.18) contradicts 


Condition]^ By integrating over y in both sides of Equation (A.15), we obtain 


I L r J'i 

i=l j=l 2=1 j=l 


Dividing both sides of Equation (A. 15) by the left hand side of the above identity, we obtain 


two normal mixtures leading to the same observed distribution. Applying the identifiability 
of the normal mixture distribution (Titterington et al.[ 1985, Theorem 3.1.2), we obtain that 


af = and T:ijF{a'ilj{ai)/(p{ai)} = 'K[jF{a''4){ai)/ip{ai)}, for all i and j. Note 

that V'(') and ip{-) are positive. If a > a', we have F{a'ti){ai)/ip{ai)} > and 


< '^'ij F which contradicts XlLi Y^'j=i ’’’b = Yl\=i Kj = 1- Similarly, 

a < a' is also impossible. Therefore, we have a = a', and thus vrjj = vrL for all i and j. 

(c) /3 < 0. Let j3 = —/3, y = —y, = —yu and K{jlk) = We can use the same 

argument of (a) to prove identifiability of the parameters. 




□ 


Proof of Theorem Suppose that there were two different sets of parameters satisfying the same 
observed distribution: 


K 

E 

k=l 


'^k , 
CO 


y - Tk 

CO 


K' , 

F{a + Py) 

UJ 


k=l 


h^'k 


CO' 


F{a' +13'y). 


(A.20) 


We first show that the degrees of freedom v is identihable. Suppose v < lo'. We divide both 


sides of Equation (A.20) by 1/co ■ tu{{y — yi)/co}, and obtain that 


K 


E {“) 


k=l 




-F{a + Py) = 


K' 

E T^'k^tv 

k=l 


V-Pk 

uj' 




■F(a’ + 0'y). 


If /3 > 0, we let y —)• +oo, otherwise we let y —)■ —oo. The left hand side of the above equation 
converges to a positive constant larger than 7riE(Q;), but the right hand side converges to zero. 
Thus u < u' IS impossible. Similarly, lo > F is also impossible. We must have 10 = 10 '. 

We then show that other parameters are identifiable. If /3 > 0, (3' < 0, we let y —)> — 00 . The left 
hand side of the above equation converges to zero, but the right hand side converges to a positive 
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constant. Therefore, this case is impossible. Similarly, we can prove that the other three cases with 
different signs, (/? < 0,/3' > 0), (/3 < 0,/3' > 0) and (/? > 0,(3' < 0), are impossible. Therefore, (3 
and (3' must have the same sign. We then need only to discuss the following three cases. 


(a) (3 > 0,(3' > 0. Suppose that ((3,a,uj) ^ {(3',a 'Adding - 7rA;/w-tj.{(y-/rfc)/a;}F(a'+ 


(3'y) to both sides of (A.20), and then dividing them by l/oj ■ tu{{y — yii)/uj}, we have 

K' 


V {™) 


■{F(q; + py) - F{a' + /3'y)} ^ 


E - i: 

k=l ^ ' k=l j-,/ / . q! 




F{a' + p'y). 
(A.21) 


If w / cj', as y —)• + 00 , the left hand side of the above equation converges to zero, but the 
right hand side converges to a nonzero constant. Therefore, we must have u = oj'. 

We can prove that, there exists an M > 0 such that 

fc=l ^ ^ k=l 


= 0(y (y -y ±oo). 




Then Equation (A.21) implies that the following limits are both finite and positive: 


liin y'^{F{a +(3y) - F{a'+ P'y)}, lim y'^{1 - F{a + (3y)/F(a'+ (3'y)}. 

y^+oo y^—oo 

Let y' = a'+(3'y, substitute y with y' in the above limits, and we find that the limits contradict 
Condition]^ So we must have (3 = (3' and a = a'. Therefore, F{a + f3y) and F{a' + (3'y) 


cancel each other in Equation (A.20). The identifiability of other parameters reduces to the 


identifiability of location mixture of t distributions (Titterington et al., 1985, Theorem 3.1.2) 


(b) (3 < 0, (3' < 0. Define y = —y, (3 = —(3 and (Ik = —y-k- Similar to the above case, we can 
prove that all the parameters are identifiable. 


(c) (3 = (3' = 0. Integrating both sides of Equation (A.20) over y, we obtain F{a) = F{a'), which 


implies a = a' hy strict monotonicity of F{-). Dividing both sides of ( |A.20 ) by F{a), then 
the two sides are both mixtures of t distributions. Therefore, ^ and 


by the identihability of t mixture distributions (Titterington et ah, 1985, Theorem 3.1.2). 


□ 
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